Statistical data validation methods for large cheese plant database.

نویسندگان

  • S A Jimenez-Marquez
  • C Lacroix
  • J Thibault
چکیده

Production data of the cheesemaking process are used to monitor milk fat and protein recoveries in cheese, cheese yield, and composition and eventually to predict these parameters. Due to the large impact of these factors on cheese quality and plant profitability, it is very important to use reliable data for analysis, modeling, and control of the process. This paper tested six methods for detecting erroneous data in industrial cheesemaking databases. The data analyzed came from 4 yr of stirred-curd Cheddar cheese production in an industrial cheesemaking facility, comprising over 10,000 vats. Single vat outliers were detected using a simple statistical criterion of mean +/- 3.6 SD on single variable distributions, Fourier series modeling of seasonal variables (fat, protein, lactose, and total solids in milk, and protein in whey), and the multivariate Mahalanobis outlier analysis. Detection of outlier productions (corresponding to several vats) was done by applying the mean +/- 3.6 SD criterion to variables obtained through calculating the fat mass balance, fat retention coefficient, and yield efficiency. Data treatment enabled the detection of outlier data, but also pinpointed variables with a low reliability (manually registered times). Single variable and multivariable methods proved complementary, and the use of both types of methods is recommended when validating an existing database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of plant oil addition to cheese by synchronous fluorescence spectroscopy

The fraudulent addition of plant oils during the manufacturing of hard cheeses is a real issue for the dairy industry. Considering the importance of monitoring adulterations of genuine cheeses, the potential of fluorescence spectroscopy for the detection of cheese adulteration with plant oils was investigated. Synchronous fluorescence spectra were collected within the range of 240 to 700 nm wit...

متن کامل

Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling.

Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible meth...

متن کامل

A Pipeline to Automate the Updating of a Specialized Protein Database

Motivation: The growing number of specialized databases in molecular biology, coupled with the huge increase in the availability of molecular data, necessitates the development of automatic methods for finding and adding relevant information to these databases. Results: We show how a general protein database (Swiss-Prot) can be used as a source of data for a more specialized one (TCDB, the Tran...

متن کامل

Reducing the probability of false positive research findings by pre-publication validation – Experience with a large multiple sclerosis database

BACKGROUND Published false positive research findings are a major problem in the process of scientific discovery. There is a high rate of lack of replication of results in clinical research in general, multiple sclerosis research being no exception. Our aim was to develop and implement a policy that reduces the probability of publishing false positive research findings. We have assessed the uti...

متن کامل

Mutaphrase: Paraphrasing with FrameNet

We describe a preliminary version of Mutaphrase, a system that generates paraphrases of semantically labeled input sentences using the semantics and syntax encoded in FrameNet, a freely available lexicosemantic database. The algorithm generates a large number of paraphrases with a wide range of syntactic and semantic distances from the input. For example, given the input “I like eating cheese”,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of dairy science

دوره 85 9  شماره 

صفحات  -

تاریخ انتشار 2002